Compositional Neural Network Language Models for Agglutinative Languages
نویسندگان
چکیده
Continuous space language models (CSLMs) have been proven to be successful in speech recognition. With proper training of the word embeddings, words that are semantically or syntactically related are expected to be mapped to nearby locations in the continuous space. In agglutinative languages, words are made up of concatenation of stems and suffixes and, as a result, compositional modeling is important. However, when trained on word tokens, CSLMs do not explicitly consider this structure. In this paper, we explore compositional modeling of stems and suffixes in a long short-term memory neural network language model. Our proposed models jointly learn distributed representations for stems and endings (concatenation of suffixes) and predict the probability for stem and ending sequences. Experiments on the Turkish Broadcast news transcription task show that further gains on top of a state-of-theart stem-ending-based n-gram language model can be obtained with the proposed models.
منابع مشابه
Joint PoS Tagging and Stemming for Agglutinative Languages
The number of word forms in agglutinative languages is theoretically infinite and this variety in word forms introduces sparsity in many natural language processing tasks. Part-of-speech tagging (PoS tagging) is one of these tasks that often suffers from sparsity. In this paper, we present an unsupervised Bayesian model using Hidden Markov Models (HMMs) for joint PoS tagging and stemming for ag...
متن کاملDocument Categorization with Modified Statistical Language Models for Agglutinative Languages
In this paper, we investigate the document categorization task with statistical language models. Our study mainly focuses on categorization of documents in agglutinative languages. Due to the productive morphology of agglutinative languages, the number of word forms encountered in naturally occurring text is very large. From the language modeling perspective, a large vocabulary results in serio...
متن کاملSyllable-level Neural Language Model for Agglutinative Language
Language models for agglutinative languages have always been hindered in past due to myriad of agglutinations possible to any given word through various affixes.We propose a method to diminish the problem of out-of-vocabulary words by introducing an embedding derived from syllables and morphemes which leverages the agglutinative property. Our model outperforms character-level embedding in perpl...
متن کاملLearning grammars with recurrent neural networks
Most of language acquisition models that have been constructed so far are based on traditional AI approaches. On the other hand, artificial neural networks (ANNs), contrasting with traditional AI approaches, have many great abilities such as ability to learn, generalization capability and robustness. But they are poor in representing compositional structures and manipulating them, and are consi...
متن کاملA Syllable-based Technique for Word Embeddings of Korean Words
Word embedding has become a fundamental component to many NLP tasks such as named entity recognition and machine translation. However, popular models that learn such embeddings are unaware of the morphology of words, so it is not directly applicable to highly agglutinative languages such as Korean. We propose a syllable-based learning model for Korean using a convolutional neural network, in wh...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2016